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CONCENTRATION OF PERMANENT ESTIMATORS FOR 
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Let A n = (atj)fj = i be an n x n positive matrix with entries in 
[a, b], < a < b. Let X n = (^/aijXij)"j^ 1 be a random matrix, where 
{xij} are i.i.d. N(0, 1) random variables. We show that for large n, 
det(X^ X n ) concentrates sharply at the permanent of A n , in the sense 
that n _1 log(det(XjX n )/ per A n ) — > n „ 00 in probability. 

1. Introduction. For a set F C M. and integers n > m, denote by M(n, m, F) 
the set of n x m matrices with entries in F. Put MfnjF) = M(n,n,.F). 
Let S n be the symmetric group of permutations acting on {1, ...,n}. For 
A G M(n, C), the permanent of A is defined as 



The permanent of a 0-1 matrix is of fundamental importance in combinato- 
rial counting problems. The computation of the permanent of a 0-1 matrix 
was shown to be a #P-complete problem [15], and, hence, (under standard 
complexity-theoretical assumptions) not possible in polynomial time. Since 
then the focus has shifted to randomized approximation methods. The most 
fruitful method available at present is that of the Markov chain Monte Carlo. 
In a recent paper Jerrum, Sinclair and Vigoda [10] refined the Markov chain 
Monte Carlo method to obtain a fully-polynomial randomized approxima- 
tion scheme for computing the permanents of arbitrary nonnegative matrix. 

A second probabilistic method was derived from the following basic ob- 
servation. Assume 
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<tG>S, 



<n 



(1.1) 



{x^} are independent random variables satisfying 
E(xy) = 0, E(4) = l. 
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For A G M(n,R+), let 
(1.2) X{A) = ( 

Then (see [2]), 
(1.3) 




E(detZ(A)) = perA 



Z(A) = X(A) T X(A). 



In other words, detZ(A) is an unbiased estimator of the permanent. The 
computational advantage of this estimator lies in the well-known fact that 
the determinant of a large matrix is fast (polynomial) to compute. If 
Bernoulli with x^j G {1, —1}, then the above estimator is called the Godsil- 
Gutman estimator [7]. In [2], Barvinok considers the concentration of the 
estimator (1.3) in the case Xij are Gaussian, complex Gaussian and quater- 
nionic Gaussian. (Of course, moving from real to complex, quaternion or 
higher-dimensional Gaussians entails some adjustments in the algorithm's 
description. Namely, the xfj appearing in (1.1) should be replaced with \xij\ 2 
for an appropriate norm-square, and the determinant which makes up the 
basic estimator should be redefined accordingly. We refer to [3] for a com- 
plete discussion of this point.) More precisely, for any 5 > 0, Barvinok shows 
that 



where 7 ~ 0.28 if x^ are Gaussian, 7 ~ 0.56 if x^ are complex Gaussian 
and 7 ~ 0.76 if quaternionic Gaussian. In a more recent preprint [3], 

Barvinok suggests the possibility of taking log 7 any small negative number if 
each x^ is replaced by a k x k random matrix with Gaussian entries provided 
that k is a large enough integer. Along these lines, the work of [4] chooses 
Xij to be random signed basis elements of a Clifford algebra (of dimension 
on the order of n 2 ) and proves that in this case E[det Z(^4 n ) 2 ]/E[det Z(A n )] 2 
is bounded independently of n. Such control of the second moment of the 
estimator provides concentration via Chebyshev's inequality. Further, since 
Clifford algebras have representations in terms of real, complex or quater- 
nion matrices of appropriate size, the results of [4] imply that there is very 
good concentration for real matrices of dimension polynomial in n when 
the matrices are selected from a set of basis matrices. However, it remains 
an open question whether this Clifford algebra estimator can be efficiently 
computed at large dimension. In a sense, both [3] and [4] are guided by 
the same principle: introducing more randomness at the level of the entries 
should produce additional averaging and so sharpen the concentration. 

In the present note we take a different approach. Our goal is to show that, 
in fact, good concentration is already present with k = 1, if one is willing 
to look at a restricted class of matrices. In particular, we consider the case 
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where the entries of A n are uniformly bounded both from above and away 
from zero. We do this in a slightly more general framework, considering 
rectangular as well as square matrices. A consequence of our main result, 
Theorem 2.1, is the following: 

Corollary 1.1. Let < a < b be given. Assume {xij} are independent 
identically distributed N(0, 1) random variables. Then, for any 5 > 0, 



where Z(A n ) is defined by (1.2). 

Note that while the restriction a > is stronger than one would like (it 
precludes the important 0-1 matrices), some sort of condition on the entries 
of the matrices is needed as the the example of A n = I n shows. We will have 
more to say on this point later on. 

It has recently been pointed out to us that for the case considered here, 
namely, with entries bounded above and below, the algorithm of [11] can 
be adapted to yield a polynomial time (0(n 4 )) algorithm with polynomially 
bounded error for computing the permanent. Still, we believe there is an 
intrinsic interest in the present analysis of Barvinok's algorithm. On one 
hand, there is the inherent simplicity of the algorithm, with worst case per- 
formance bounds, and our results give improved performance for a restricted 
class of matrices. On the other hand, a study of the algorithm's performance 
leads directly to rather delicate questions regarding the spectrum of a cer- 
tain class of random matrices. Indeed, our proof of the above corollary is 
based on recent concentration results for nice functionals of the spectral 
measure of random matrices [8]. However, since the function log(-) is not 
nice enough (it is not globally Lipschitz), a more detailed analysis has to be 
performed to evaluate the behavior of the bottom or so-called hard edge of 
the spectrum of Z(A n ). This analysis, which is inspired by ideas of Bai [1], 
introduces some refinements of current concentration techniques which, we 
believe, are interesting in their own right and may be applied in other con- 
texts. Indeed, followers of the random matrix theory literature will recognize 
the Z(A n ) matrices considered here as a natural class of perturbations of 
the well-known Wishart or Laguerre ensembles. 

The structure of the paper is as follows. In Section 2 we introduce our 
general model of rectangular matrices, state our main theorem, present the 
basic concentration result we need, and show how the main theorem follows 
as soon as an integrability condition of the lower tail of the spectrum of 
Z{A n ) is verified, see Condition 2.1. Section 3 is devoted to the verification 
of Condition 2.1 under appropriate assumptions on the entries of A n . In 
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Section 4 we study the flat case J nm := M(n,m, [1,1]), n>m. Of course, 
in this case per J nm = ( n ™' m y ■ Our purpose is to point out that for rectan- 
gular J nm , where m < On, 9 < 1, one immediately gets better concentration 
than Corollary 1.1. For general, n > m, we show that a simple polynomial 
sampling approximates per J nm to within an error of order one, also tighter 
than the result in Corollary 1.1. Finally, in the Appendix, we present a more 
complete study of the lower tail of the spectrum of Z(J n ) by taking advan- 
tage of its integrable structure. This analysis, which possesses independent 
interest, reveals that our Condition 2.1 needed in the course of the proof of 
Theorem 2.1 is arguably a mild condition. 

2. Preliminaries and main result. Let A G M(n, m, M+) (recall that then 
m < n). Let also Q m ,n denote the set of all strictly increasing sequences a = 
{a\, . . . ,a m } C (n) , where (n) = {1, 2, . . . , n} and set A[a, (m)] = (a ai j) G 
M(m,R+). Then, we define the permanent of A as 

per A= per A[a, (m)]. 

If A is a 0-1 matrix, then per A counts the matchings of the corresponding 
bipartite graph. 

For A G M(n,R-j-), and random variables {xij} satisfying (1.1), the iden- 
tity (1.3) is immediate, see, for example, [2]. In fact, (1.3) extends to the 
rectangular case. Indeed, for A G M(n, m, M+), define X(A) = (^/aijxij) and 
Z(A) = X(A) T X(A) as before. Then, using the Cauchy-Binet formula, one 
finds 

E(detZ(A)) = E( J2 detX[a,{m)] T X[a,(m)] J 

\a&Q m ,n ' 

= E(detX[a,(m)] T detX[a,(m)]) 
= y~] per A[a, (m)] = per A, 

proving that (1.3) holds true in this case as well. 
Our main theorem can now be stated: 



Theorem 2.1. Let < a<b be given. Assume Xij,i,j = 1, . . . , are in- 
dependent identically distributed N(0, 1) random variables. Then 

(2.1) lim sup P( -| logdetZ(A„ im ) - logper A n , m \ > 5 ) = 0, 

n ^°° A n , m eM(n,rn,la,b]) \ n / 

for any 5 > 0. 
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Part of the proof of Theorem 2.1 hinges on tailoring certain concentration 
of measure results to the present setting. To describe these results we must 
introduce a variety of notations. First, for S(m,R) C M(m,R), the set of 
(real) symmetric matrices, let, for any B £ S(m,R), denote by \\{B) < 
A2CB) < • • • < A m (-B) the eigenvalues of 5 counted with their multiplicities. 
Recall the spectral factorization B = QDQ T with orthogonal Q, its adjoint 
Q T and D the diagonal matrix of eigenvalues. This allows one to view a real 
valued function / on R as a function from S(m,R) into S(m,R) via f{B) = 
Qf(D)Q T , where /(A) is again diagonal with entries f(Xi(B)), /(A2CB)), 
and so on. And so, along with trace B = YaLi Ai(-B), we may define 

m m 

trace f(B) = £ /(A^B)), det /(B) = J] f(Xi(B)). 

i=l i=l 
Next, for / :R 1— ► R, bring in the Lipschitz norm 

*<y \x-y\ 

a function / being referred to as Lipschitz when /c < 00. Lastly, recall that 
a measure v on R is said to satisfy the logarithmic Sobolev inequality with 
constant c if, for any differentiable function /, 



00 



(2.2) J_jH 0tj j^- d ,<2cJ_JfV d , 

The general concentration result of [8], which makes up the backbone of our 
proof may now be introduced. 

Assume that X G M(n, m, R) with all Xij mutually independent with laws 
satisfying the logarithmic Sobolev inequality with uniformly bounded con- 
stant c. For Z = (-^X) T (^X) and / Lipschitz, Corollary 1.8(b) of [8] 
states that 
1 



(2.3) P 



n 



trace / (Z) -E 



1 



n 



trace f(Z) 



> 5 < 2 exp 

n 



5 2 (n + m 



i2n 



2c/ 2 



c 



for any 5 > 0. For us the individual entries of X = X{A) are Gaussian, 
which are well known to satisfy (2.2). On the other hand, we would like to 
apply (2.3) with / = the logarithm which is not Lipschitz. This is circum- 
vented by introducing a cutoff: for fixed e > 0, define 

log £ x = log(x Ve), 

which you will note is Lipschitz. Along with this we set det £ (B) = FJELi(Ai(-B) Ve) 
Finally, for A = (a^) G M(n,m,R + ), we define A = A/y/n + m and remark 
that since 

det Z(A) _ det Z(A) 
per A per A 
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it is enough, when proving Theorem 2.1, to consider matrices A n)Tn 6 M(n, m, 
[a/yjn + m, b/y/n + m]). 

With that said, the form of the general concentration result (2.3) that we 
will need is stated next as a lemma, the proof of which is deferred to the 
end of the section. 

Lemma 2.1. Under the assumptions of Theorem 2.1, let e G (0, 8b 2 ) and 
< s n , n = 1, . . . , be a sequence diverging to oo. Then, for any 5 > 0, 

lim sup P( — | logdet e Z(A„ im ) 



n->oo A„, m eM(ra,m,[0,&]) \ s n 



(2.4) 

-logE[det e Z(Ai,m)]l><*) =0- 



The statement remains true if e = e n — ► as n — > oo, so long as s n e\ — ► oo. 

That is, concentration holds at any rate if the small eigenvalues are ig- 
nored by way of the cutoff logarithm. Extending beyond the cutoff requires 
the following integrability condition alluded to above. 

Let A n , m C M(n, m, [0, 6]). 

Condition 2.1. There exist sequences e n — > 0, s n — > oo, such that s n e^ — > 
oo and 

(2.5) limsup sup P[ — ^ log ^= >5j=0. 

n^oo A n , m eA n , m Vn Xi[z(An m))<en \{Z{A n>m )) J 

Theorem 2.1 is a direct consequence of the following two propositions. 

Proposition 2.1. Fix b < oo and assume that A nt m satisfies Condi- 
tion 2.1. Then, for any 5 > 0, 

(2.6) lim sup P(— |logdetZ(i n , m )-logE[detZ(i n , m )]| > s) = 0. 

Proposition 2.2. For0<a<6<oo, the class of matrices ~M(n,m, [a, b}) 
satisfies Condition 2.1 wii/i s n = n and e n = (logra) -4 . 

Certainly it is of theoretical interest to extend the result of Proposition 2.2 
and, thus, Theorem 2.1 to classes of matrices allowing some number of zero 
entries. As indicated above, there also exist important applied problems for 
which such a result would be of great use. In this direction we have only the 
following observation in the case of what we refer to as strictly rectangular 
matrices. It was pointed out to us, together with its proof, by Silverstein. 
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Proposition 2.3. Consider the class of matrices contained in M(n, m, 
[0, b]) such that m = m n satisfies lirrin^oo (m/n) = 9 < 1. Restrict further to 
the subset of those matrices such that the maximum number of zero entries 
in any column is bounded by \^n\ for all large n with all other entries 
contained an interval [a, b] bounded away from zero. If, moreover, 7 < 1 — 9, 
then Condition 2.1 is satisfied with s n = n and any e n — ► 0. 

We conclude this section with the proofs of Proposition 2.1 and its sup- 
porting Lemma 2.1. The proofs of Propositions 2.2 and 2.3 are deferred to 
Section 3. 

PROOF of Proposition 2.1. Take an element A n>m e A n>m and for 
fixed 5 > define the numbers 

g n {e n ,5) :=P(— I logdet £n Z(i n;m ) - logE[det £n Z(i„ im )]| > 5 

and 

h n (e n ,5) :=P\J-(logdet en Z(A ntm ) -logdetZ(A n , ro )) 

= p(— E lo s — -~ — 

\ Sn X l{ Z(A n , m ))<e n ^i{Z{A n . m )) , 
appearing in the simple bound 
1 



-25s n 



,logdetZ(A n , m ) -logE[det £n Z(A n , m )]| > 25 

(2.7) \Sn 

< 9n(£n,S) + h n (e n ,6). 
Next note that, as long as per A n ^ m > 0, one may apply Chebyshev's inequal- 
ity to the ratio detZ ^" ,m ^ to produce 

perA„, m 

(2.8) P^(logdetZ(i nim )-logperi nim ,)>2^ < e 

for n = 1, 2, Both here and above we are interested in s n j 00 while e n [ 0. 

Now take a small positive e' < 1/4 and notice that by Condition 2.1 and 
Lemma 2.1, there exists a large enough integer N(6,e') so that 

sup {g n (£n,S) + h n (e n ,5)} < e' and e ~ 25sn < e' for all n > N(6,e') 

Hence, for each A ntTn £ An tm , and each n > N(6,s'), the set of Z(A n ^ m ) 
satisfying both inequalities 

— |logdetZ(i„ im ) -logE[det £n Z(I nim )]| < 25, 
— (logdetZ(A nim ) -logper A n , m ) < 25 



8 S. FRIEDLAND, B. RIDER AND O. ZEITOUNI 

has probability at least 1 — 2e' . Further, since per^4 njTTl = E[det Z(A n!m ] < 
E[det £lI Z(A nim )], it follows that 
1 



s 



logper,4 n ,m - logE[det £n Z(A n , m )]| < 45 for n > N(5,e'). 



n 



(Note that we deal here with a deterministic difference, hence, if it is bounded 
above with a positive probability then it is actually bounded above.) Com- 
bining the above inequalities with (2.7) we deduce that 

sup P (— | log det Z(An )m ) - log per(I n , m )| > 65 J < e' 

for n> N(5,e'), 

completing the proof of Proposition 2.1. □ 

Proof of Lemma 2.1. Applying (2.3) with the choice / = log e , we 
obtain 



(2.9) 



1 m 1 
■^log^^A^)) — E 



m + n f— f m + n 

i=l Li=l 



J2log e \i(Z(A n , m )) 



>5 



< 2e 



-(m+n) 2 e 2 5 2 /(8b 2 ) 



The particular form of the right-hand side rests on the readily checked 
(log e )£ = 1/e and the well-known fact that a centered Gaussian distribu- 
tion has logarithmic Sobolev constant equal to its variance. Next set 

U = logdet e Z(i„ >m ) - E [trace log £ Z(A n>m )], 

and note that (2.9) yields for any t > 0, 

P(\U\>t)<2e- e2t2 ^ 8b2 \ 

Thus, 

E[e u ] < E[e |c/| ] < 1 + / e*P(|?7| > t) dt 
Jo 

poo 

<l + 2 / e t - £2t2 l^ b2 Ut<l + 2e 2b2 / e2 . 
Jo 

We conclude, together with Jensen's inequality, that 

E[logdet e Z(I n , m )] < logE[det e Z(i n , m )] 

< E[logdet £ Z(i n , m )] + log (1 + 2e 2b2 / £2 ). 

This, together with (2.9), yields immediately Lemma 2.1 in the case of 
fixed e. But by inspecting the above bound, one sees that even if e = e n \ 0, 
the statement holds so long as the condition e 2 n s n — > oo is respected. □ 
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Remark 2.1. It is natural to ask what can be said about the perfor- 
mance of the Godsil-Gutman algorithm under the same conditions on the 
matrix A. That is, whether the Gaussians in our statement can be replaced 
by ±1 Bernoullis. Toward that, it is true that other concentration results 
of [8] provide a statement similar to (2.3) as long as the individual laws of 
the entries of X are compactly supported. However, as we will see in the 
next section, the isotropic property of the Gaussian is essential in our proof 
of Proposition 2.2. 



3. Controlling the small eigenvalues. We now remove the cutoff intro- 
duced in the logarithm necessary to go from the concentration inequality 
of Lemma 2.1 to our main result. That is, the proof of Proposition 2.2 is 
carried out. In fact, we prove the following slightly stronger statement. 



Proposition 3.1. For all e small enough and all n > m + 3, it holds 
that 



(3.1) sup E 

eM(r»,m,[o,6]) 

Further, for any n>m, 



— log — —= — 

\i(Z(A n )) 



\{Z{A n ))<e 



< 



e|loge| (n + m)m 
a n(n — m + 1) 



(3.2) limsup sup E 

nfoc ^n,m6M(n,in,[a,ii]) 



i 



Xi(Z(A n ))<e r , 



K{Z{A n )) 



as soon as e n = (logn) 4 . 



Indeed, Proposition 2.2 follows from (3.2) by Chebyshev's inequality. Note 
also that in the strictly rectangular case, limsup n ^ QO (m/n) < 1, (3.1) shows 
that e n may be taken to go to zero arbitrarily slowly. The proof of Propo- 
sition 2.3 uses a variant of (3.1); the details are reported at the end of this 
section. 

The proof of Proposition 3.1 makes essential use of the following simple 
observations. 



Lemma 3.1. Let V be an element ofM.(n,m,R) with statistically inde- 
pendent entries drawn from continuous distributions. Denote by Vk the kth 
column of V and by Vjt the matrix formed by deleting from V . Then, 
det(V T V) / and det(V fc T V fc ) / 0, a.s. Further, 

det(V T V) = det(V?V k )[vUl-V k (V?V k )- l V?)v k ] 

(3 3) 

= :det(V k T V k )[vlP k v k ], 
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from which it follows that [(V T V)^ 1 ]kk = ( v k PkVk) -1 ■ Pk is a projection, 
almost surely onto a subspace of dimension n — m + 1, and P k and v k are 
independent. 

When V = X(A) for A G M(n, m, [a, b]), one has that v k = D k x k with 
D k = diag( y /a k ~[,...,^/a k ~^), and 

n—m+1 

(3.4) v k P k v k = x k D k P k D k x k = \i+ m -i{D k P k D k )x 2 ik , 

i=i 

where the {xi k } 1 ^ , k n ^ =1 are independent standard Gaussians and, for each k, 
{ x ik}i=i and {K(D k P k D k )}^ =1 are also independent. Furthermore, we have 
the bound 

n—m+l n—m+1 n—m+1 

(3.5) a %ik< h+m-i{D k P k D k )x 2 ik <b ^ x 2 ik . 

i=l i=l i=l 

Proof. The representation (3.3) is commonly exploited in the type of 
random matrix estimates required below. See, for example, [1] where it is 
used repeatedly. To understand it, recall the interpretation of det^^V as the 
square of the volume of the parallelepiped spanned by the column vectors 
vi,...,v m . Clearly, this is the same as detV^V^ times the square of the 
length of the projection of v k onto the space orthogonal to the span of 
the columns of V k , but this is just what (3.3) says. That P k and v k are 
independent is clear from the definitions. 

Now in the case of V = ^(^4), one simply notes that the quadratic form 
x^D k P k D k x k may be diagonalized by setting x k = Qx k with an appropriate 
orthogonal matrix Q. By isotropy, the entries of the vector x k remain in- 
dependent standard Gaussians. The bound on the eigenvalues follows from 
considering the Rayleigh quotient: with y = P > k 1 z, 

z T P k z y T D k P k D k y _ z T P k z z T P k z 
z 1 z y 1 y z 1 D k z z 1 z 

From the min-max theorem, one sees that for all i, 

a\i(P k ) < \i(D k P k D k ) < b\{P k ). 

As P k is a projection onto an n — m + 1 dimensional subspace, \i(P k ) = 
for i < m — 1 and Xi(P k ) = 1 for i > m — 1, completing the statement. □ 

Proof of Proposition 3.1. We begin with the rectangular case: A n £ 
M(n, m, [a, 6]), n > m + 3 (for economy of space, we will omit the subscript m 
from A n ^ m ). By the monotonicity of xlog(l/x) for x G [0, |], it follows that 
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for any positive e < -. 



J2 lo §- 



(3.6) 

" » ^A,(Z(A n )) n ^ [ 1 n> Jtt 
By Lemma 3.1 each [Z(A re ) _1 ]jj is stochastically bounded as in 

where £/j n is distributed as one over a x 2 random variable with n — m + 1 
degrees of freedom, the mean of which one can compute exactly: 

J°o r (n-m-3)/2 e -r/2 j 
E l U i,n] = joo r{n - m -l)/2 e -r/2 dr = „_ m+ l' 

Thus, one finds that for e < i, 

s| log er| (n + m)m 



E[M n ] < 



(n — m + l)n ' 



which explains the bound (3.1). 

To complete the proof of the proposition, it is enough to consider n < 2m. 
Take 4 n G M(n ,?7i, [a, 6]), and denote the columns of A^(^4 n ) by x±, . . . ,x m 
[xi = {n + m)~ l / 2 DiXi in previously used notation]. Recall the following iden- 
tity for the determinant from Lemma 3.1: 

det (Z(A n )) = det (Z((i n )i))[xf P lXl }. 

By (A n )i we mean the matrix formed by the last m — 1 columns of A n . The 
matrix Pi projects onto the (n — m + l)-dimensional space orthogonal to 
the span of the columns of X((A n )i); it is independent of the vector x\. 

The above may be iterated: first applying the identity to det(Z({A n )\)) 
and so on. We take a positive 9 = 9 n <S 1 , and after carrying out this proce- 
dure [n6* n ] times, we write the outcome as follows: 

\n6„~] 

det (Z(A n )) = det (Z(B n )) [] [xjfox*]. 

fc=i 

Here .B n G M(n, fh n , [a,b]) is the matrix formed by the last fh n = m— |~n# n ] 
columns of A n , and each Pk is an {n — m + A;)-dimensional projection inde- 
pendent of Xfc. The above equality is re-expressed as 

lo §- 
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(3.7) 



j m-m n 
X i{ Z { B n ))< En X ^ B n)) n i=l 



1 

+ - 

n 



£ logAi(Z(B n )) 

Xi(Z(B n ))>e„ 



£ logAi(Z(I n )) 

L A 4 (Z(i„))> £ „ 
= I n + //„ + ///„. 

The point is that the estimates obtained for the strictly rectangular case 
may be directly applied to B n and so to I n . That is, we know from (3.1) 
that there exists a numerical constant C\ such that 

e n \ \oge n \ 



(3.8) 



Wn] < Cl 



for all sufficiently large n. The term II n may also be handled by previ- 
ous considerations. Prom Lemma 3.1 it follows that: with a < 7$ < b for all 
i and {xj} independent standard Gaussians, 



-E[log(xlP k x k )] = -E 



log 



Tigf H h 7fc^fc 

n + m 



< log(n + m) — log a — Eflogxf]. 

The last expectation is certainly finite and so there is a constant C2 (de- 
pending on a only) such that 



(3.9) 



Win] < 



\nO r , 



n 



E 



log 



< C 2 6 n logn. 



n + m_ 

As for the last term to be bounded, III n , first note that by the interlacing 
inequalities for any I < fh n , 

k{Z{A n )) < \i(Z(B n )) < X l+ln6n] (Z(A n )). 

Thus, if I* is the smallest I such that Xi(Z(B n )) > e n , then Xi*-\{Z{A n )) < e n 
and \i* + \ n e n -\{Z(A n ))>e n . 

Now for each I such that Xi(Z(A n )) > e n , the term containing log Xi{Z(A n )) 
is paired with the corresponding object in the B n sum. The contribution to 

IIIn IS 

1 , X,(Z(A n )) 
n & A,(Z(S ft )) " 

In this manner it is possible that the largest \n9 n ~\ of the Xi(Z(A n )ys and 
the smallest \n6 n ~\ of the Xi(Z(B n )ys in /// n remain unpaired. That is, 

///n<- V |logA i (Z(i n ))| + - V |logAi(Z(B„))| 

i=m—\nt) n \ i=r 
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(3.10) 



< 49 n max ( | log e n \ , log X n ( Z (A n ) ) ) 



for all n large enough. The random variable remaining on the right-hand 
side is, in turn, controlled by 



E[A n (Z(i n ))]<E[traceZ(I n )] 



1 



n + m 



-E 



2 



i-.l 



< bn, 



an admittedly crude but sufficient bound. 
Lastly, (3.8)-(3.10) are combined to produce 



E 



- E 

n 4^ 



log 



\r{Z(A n ))<e r , 



Xi{Z{A n )) 



< Ci £n|1 ° g£nl + C 2 9 n log n + C 3 n (|log£ n | + log n) 

for all large enough n. The proof is then finished by choosing 8 n = (logn)~ 2 
and e n = (logn) -4 . □ 

Proof of Proposition 2.3. For simplicity take m = m n = \n0~\ . Trac- 
ing the proof of the bound (3.1), one comes to the inequality: with again 
A — A 



-E 



n 



\k(Z(A n )) 



(3.11) 



\ k (Z(A n ))<e 



e\ loge|(n + \n6]) 



n 



Ml 
EE 

k=l 



x{ {D k P k D k )x k 



Further bounding above requires controlling the eigenvalues of D k P k D k 
from below. This was previously accomplished by a Raleigh-Ritz argument 
(Lemma 3.1). In the case that there are some number of zero entries this 
needs to be replaced by the more sophisticated inequalities of Fan [6]. 

Note that with the number of zeros in any column bounded by \nj~\ , 
P k still projects onto an (n — \n8~\ + l)-dimensional subspace (a.s.). The 
problem lies in the zeros on the diagonal of D k . 

Now for any invertible nonnegative Hermitian matrix M\ and nonnegative 
Hermitian M2, Fan [6] gives us that 



(3.12) 



A M2 < _\ M i A/ 2^ M i 1 



i+j+l 



i+1 
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in which Xf 1 is the ith largest eigenvalue of the matrix M (twice). By con- 
tinuity this inequality still holds when M\ has some zero eigenvalues. It is 
to be applied in this setting with M\ = P^ and M2 = D? (the eigenvalues of 
DkPkDk and PkD 2 being the same). With that, (3.12) reduces to 

\ D l < \D k P k D k 
A \nff\+i - A i+l 

Therefore, if has n — pyn] eigenvalues larger than a (or the kth column of 
A n has at least that many entries similarly bounded below), then DkPkDk 
has at least [n — [717] — \n0~\) eigenvalues larger than a. 

By assumption the above holds for each k and 1 — 7 — 9 > 0. Thus, the 
Gaussian quadratic form in the denominator of (3.11) is stochastically larger 
than a times a \ 2 random variable of degree at least (1/2)(1 — 6 — 7)71 for all 
large enough n. The right-hand side of (3.11) is then bounded by a constant 
(depending on 0,7 and 6) times e| loge| and the statement follows. □ 

4. The flat case. This section is devoted to a study of the flat case 
A- = Jnm, n> m. This special case is typically referred to as the Laguerre 
or Wishart Ensemble in the random matrix theory literature. Of course, 
per J nm is easily computed, and there is no need for an approximate algo- 
rithm. However, we wish to emphasize two points in this simpler setting 
which suggest our general concentration result for the permanent may not 
be optimal. 

The first point focusses on the strictly rectangular case. We have the 
following: 

Proposition 4.1. Let n > m n , n,m n e N and assume that {xij}™'^ 
are independent identically distributed N(0, 1) random variables. Suppose 
that 

(4.1) limsup— <6< 1. 

n— >oo Ti 

Then for any sequence s n diverging to 00, 

lim P ( — I log det Z(J nnin ) - log per J nrrin \ > S ) = 0. 

On the other hand, for flat matrices of more general shape we introduce a 
new polynomial-time estimator that approximates the permanent to within 
order one error. The statement follows. 

Proposition 4.2. Define Y n = n~( 2+ ^ YJk=Q X k' in which each X% is 
an independent copy of det(Z( J nm )) with n>m and p > 0. It holds that 

(4.2) P((l - 5) per J nm <Y n <(l + 5) per J nm ) > 1 - -L 
for all n>2. 



CONCENTRATION FOR PERMANENT ESTIMATORS 



15 



Both propositions are easily explained. The first is a consequence of the 
nice result of Silverstein [13] which says that if (4.1) holds, then Xi(Z(J nrrin )) 
converges in probability to a positive constant as n — > oo. In other words, in 
this setting Condition 2.1 trivially holds for all e small enough. 

Proposition 4.2 makes use of the well-known result (see again [13]) that 
the determinant of Z(J nm ) has the distribution XnXn-i ' ' ' Xn-m+i- Here the 
notation refers to the distribution of the product of independent random 
variables with the indicated x 2 distributions. A proof of this fact may be 
drawn from revisiting Lemma 3.1, as follows: Let A denote an element of 
M(n,m,]R + ) and A k the matrix formed by removing the /cth column. Again 
bring in the random matrix A(^4) with columns by 

with x k = 

D k x k . Lemma (3.3) provides that 



det (Z(A)) = det {Z{A m ))\x T m D m P m D m x 



(4.3) 



n [x T k D k p k D kXk ] = n 

k=l 



k=l 



n—k+1 
Y, \i +k -i(D k P k D k ) 
i=i 



x ik 



in which the last equality is in law with the {x{ k } independent standard 
Gaussians. In the flat case A = J nm is affected by D k = I for all k, which 
is to say that \i(P k ) = 1 when i > k. The advertised distributional identity 
follows. 

The use of this is in computing moments of det(J nm ). That is, 



per J nm = E[det Z(J nm )\ = E 
which we knew before, but now also 
E[detZ(J nm )] 2 = E 



n 

k=n—m+l 



Xk 



71! 



(n — m)\ 



n (xi? 

k=n— m+1 



n ( 

k=n— m+1 



2k) 



(n — m)\ 



< 



til 



(n — m)\ 



' /^2\ (n + 1) 2 



n 

n 

k=n— m+1 

n! 



. i=k 



(n — m)\ 



With this estimate, the proof of Proposition 4.2 follows easily from Cheby- 
chev's inequality. 

The question posed here is whether either approach [taking advantage of 
either the restrictive geometry as in (4.1) or the determinantal formula (4.3)] 
might lead to similarly sharp concentration in the more general M(n, m, [a, b\) 
case. Believing that this should be so really comes down to believing that the 
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bottom of the spectrum of Z{A) for A £ M(n,m, [a, b]) is not much worse 
than that of Z(J nm ). Of course, providing support for the latter statement 
has been the main technical goal of the present work. 



APPENDIX 

A large part of the above argument entailed proving a certain integrability 
of the logarithm of the small eigenvalues of a Wishart-type matrix. It is 
interesting that the issue of controlling the bottom of the spectrum comes 
up in a great many problems (see, once more, [1] for an example). While 
not directly relevant for the study of the permanent, we wish to point out 
in this appendix that an exact analysis of the flat case ( J n ) reveals a much 
stronger integrability than that proved in Proposition 3.1. It is natural (and 
an underlying theme of this paper) to suppose the actuality of the more 
general case M(n, [a, b]) is similar. For brevity we present the computation 
in the complex setting (the computation in the real case employs Pfaffians 

and requires nontrivial modifications), where Y(J n )ij = ( — with 

xf- 1 independent standard Gaussians. In fact, we note that in this case [12] 

have computed the law of the determinant of Y(J n )*Y(J n ). 
Our result is the following. 

Proposition A.l. For Y(J n ), annxn matrix with entries independent 
complex Gaussians of mean and variance 1/yfn, the eigenvalues \ of 
Y(J n )*Y{J n ) satisfy 



lim lim E 

e— >0 n—*oo 



L \i<e 







for any a < 1/2. 

Proof. The present ensemble is integrable in the sense that the joint 
density of the eigenvalues Ai, . . . , A n is known explicitly [9]: 



P(Ai,A 2 ,...,A n ) 



(A.l) 



C n exp 
1 



-n 



i=l J i<j 



■ dot 



ni 



I1( A *- A ;)' 



n-l 



k=0 



0<i,j<n-l 



where L° denote the Laguerre polynomials: the family \lA } orthogonalized 
x k+p e -x on [o,oo). From the determinantal formula (A.l), you may derive 
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the eigenvalue density 



p n (x) = E 



1 n 

-EM' 



k=l 



n-1 



e-™Y.( L K nx )) 2 - 

k=0 



By Christoffel-Darboux and the rules 

there is also the form 
p n (x) = ne~ n 

(XX 

= ne- nx \L l n {nx)L l n _ 2 {nx) - (L^nx)) 2 



^ k (x) = -L^l and iLt^CJW+C;^) 



— L^nx^^nx) - L° n (nx)^-L° n _ x (nx) 



Thus, the integral to be examined is 

(A.2) n f x- a [L l n {nx)L l n _ 2 {nx) - (L^nx)) 2 ^-™ dx. 



Near zero, it is known [14] that e~ x l 2 L\{x) < Cnx 2 for < x < K/n with 
a large constant K, which allows you to dispel of the integral (A.2) for 
x < Kj n 2 : either term is of order 



K/n 2 



x- a {e- nx/2 L l n {nx)fdx < Cn 2+C 



K/n 



A— a 



dx = 0{n~ 3+2a ) 



For what remains, one needs the following (see [5]): uniformly on < z < cv 
(c< l,u = 4n + 4), 

e~ z/2 L l n (z) 

(A.3) 



( n-l)i(l(,/n) Nl/2 



Ji(vi/>(z/n))+0 



3/2 



n 



f{vijj(z/n)) 



where ip(t) = (1/2)V* - t 2 + (1/2) sin" 1 f(t) =t for t < 1, t~ 1/2 otherwise 
and Ji is the Bessel function. We consider z = nx,x < e C 1, on which 
i/j' /ip(z/n) ~ y/x. Note also that 

J~n~/2J\{z) ~ -^= cos(z — 37r/4) for z | oo. 

Substituting (A.3) into the restricted integral (A.2), we first consider 
terms involving the second factor in (A.3). On the range K/n 2 < x < e we 
have -*02 f {w4>{z / n)) < c??," 3 / 2 ^ 1 / 4 and J\{vtjj{z / n)) < c/y/nx, yielding the 



contributions of order 



n / x^il/^fix 1 ^/^' 2 ) 2 dx = 0(n- 2+2a ) 

JK/n 2 
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and 

n ^ x- a (l/^) 2 (l/n 1/2 x 1/4 )(x 1/4 /n 3/2 )dx = 0(n~ 1+2a ), 

JK/n 2 

both vanishing for n — > oo as soon as a < 1/2. That leaves us with 
x~ a p n (x) dx 

re l 



n 



x- a -[Ji{(n + Ji((n - l)y/x) - Jf(ny/x)] dx 

K/n 2 X 

1 r £ 1 

~ (n -l+2« 

n , 



T x- a -J 1 ((n + l)^)J 1 ((n-l)Vx)dx + 0(r 

JK/n 2 X 



For the first term on the right-hand side, the integrand is overestimated as in 
| cos((n + l)yfx ) cos((n — l)y/x) — cos 2 (n v / x )| < x for < x < e. The remain- 
ing integral is then controlled by a constant multiple of Jx/ n 2 x -1 / 2-Q dx = 
£ i/2-q _ Q^ n 2a— ly fp^g sec0 nd term is even easier: the bound J\{nz) < 
C ' I 'y/nz shows it to be of order (1/n 2 ) J#/ n 2 x~ 3 / 2 ~ a dx ~ n 2a_1 . □ 
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